18 research outputs found
Accelerated Event-by-Event Neutrino Oscillation Reweighting with Matter Effects on a GPU
Oscillation probability calculations are becoming increasingly CPU intensive
in modern neutrino oscillation analyses. The independency of reweighting
individual events in a Monte Carlo sample lends itself to parallel
implementation on a Graphics Processing Unit. The library "Prob3++" was ported
to the GPU using the CUDA C API, allowing for large scale parallelized
calculations of neutrino oscillation probabilities through matter of constant
density, decreasing the execution time by a factor of 75, when compared to
performance on a single CPU.Comment: Final Update: Post submission update Updated version: quantified the
difference in event rates for binned and event-by-event reweighting with a
typical binning scheme. Improved formatting of reference
GPU Concurrency: Weak Behaviours and Programming Assumptions
Concurrency is pervasive and perplexing, particularly on graphics processing units (GPUs). Current specifications of languages and hardware are inconclusive; thus programmers often rely on folklore assumptions when writing software.
To remedy this state of affairs, we conducted a large empirical study of the concurrent behaviour of deployed GPUs. Armed with litmus tests (i.e. short concurrent programs), we questioned the assumptions in programming guides and vendor documentation about the guarantees provided by hardware. We developed a tool to generate thousands of litmus tests and run them under stressful workloads. We observed a litany of previously elusive weak behaviours, and exposed folklore beliefs about GPU programming---often supported by official tutorials---as false.
As a way forward, we propose a model of Nvidia GPU hardware, which correctly models every behaviour witnessed in our experiments. The model is a variant of SPARC Relaxed Memory Order (RMO), structured following the GPU concurrency hierarchy